4 Cyberattacks That You Would Miss Without AI
Moore’s Law, advocated by Gordon Moore of Intel fame, says that the computational capabilities will double every 18 to 24 months. And we’ve seen that really unfolding over the last 30 years (see chart). It’s really stoked people’s imagination, so much so that many believe that the promise of artificial intelligence (AI) could become reality, and computers could actually learn to think like humans.
I believe it’s still a number of years away, but it is fueling a lot of hype regarding AI. What it’s truly capable of, where it can be effective, and what it takes to implement it, all of which have become somewhat inflated in the market today.
Not all of the hype is hype. AI has added an incredible amount of benefit to society. For example, AI has enabled so much progress with predicting the weather, both in storm tracking and understanding weather phenomena — speech recognition and real-time translation are other areas we’ve seen great advances.
But what about AI for cybersecurity?
The number one concern among security teams is the sophistication of threats. Adversaries use evasive, polymorphic attacks so that a new attack never looks like a prior attack. Criminals also automate their attacks, all of which is completely inundating IT environments.
The other thing that we see is that IT environments are becoming more complex. By 2020, nearly 70 percent of all IT workloads are expected to run in the public cloud. Billions of devices are being added every year to the IoT population. More and more customers are embracing BYOD. We see a greater percentage of traffic these days being encrypted. All of this means that the ability to lock down your network becomes less possible.
As if that’s not enough, we also see a terrible talent shortage. When you bring all of this together, you might conclude that AI is not only a viable solution, but it may be the only solution. If you have floods of traffic and the requirement to dispatch that traffic in some sort of an intelligent way, AI seems to fit the bill.
And while not all AI-based solutions may live up to the hype, we have witnessed where it is essential to detecting some very sophisticated attacks; attacks we’d miss without AI.
But, when people say AI, many think all AI is the same, where it’s not. AI is a big science. It includes expert systems, supervised machine learning, unsupervised machine learning, as well as deep learning.
I’d now like to describe four attacks and the specific AI techniques that are essential to detect them.
Emotet is a modular banking Trojan. And while it’s been around for years, it’s actually gaining speed. What’s interesting about Emotet is that it is polymorphic, meaning every time it’s deployed, it changes itself. And it’s self obfuscating, so it’s impossible to detect using any type of signature model.
To help explain how AI can detect Emotet, and the other attacks I’ll describe, I’ll use a metaphor.
For Emotet, I’ll use the parable of the six blind men and the elephant. Each of them puts out their hand and experiences the elephant in a different way. One grabs the tail and is convinced that he’s holding onto a rope. One grabs the tusk and is convinced that he’s holding onto a saber. One touches the leg and is utterly convinced that he’s holding onto a tree.
Of course, all of them are wrong because their ability to sense the elephant is very limited. They don’t see the full picture. They don’t collaborate, using their communal experience of knowing what an elephant is, or in our case, knowing what a piece of malware is.
Detecting Emotet requires supervised machine learning and the use of expert systems. The supervised machine learning is represented by the people touching the elephant in different areas and collecting different pieces of data. Then the software can make the determination that it’s indeed an elephant because it has been trained on what an elephant looks like.
More specifically to Emotet, the first indication may be a suspicious data upload. In and of itself, it’s not highly malicious. But then there’s a suspicious remote task being scheduled. And when I use AI to put those together with a malicious document attachment, now I recognize it as Emotet.
2. Mirai IoT Botnet
Mirai was an IoT botnet assault that installed remotely, largely on personal computing devices or on IoT devices including security cameras, smart thermostats, and remote doorbells.
On October 21, 2016 this attack brought down a large part of the Northeast, California, and other areas (see map). We lost access to a lot of things, including Twitter, Netflix, some business-critical services, and industrial automation and control systems
Now, the question is, what type of AI insight would you use to determine whether or not an attack such as this was afoot? The answer is unsupervised machine learning.
The AI engine trains on data. All day long, it’s looking at the Internet, looking at normal traffic that’s moving from left to right, north to south, and clustering together similar activity, figuring out what is normal.
Using another analogy, what most of it looks like is cats. They may be different color cats, different breeds, different sizes. But they’re all cats with typical cat characteristics – whiskers, four paws, fur, etc. I don’t know if a cat is good or bad, but I have baselined my system on cats.
Then, I start to see what AI experts call drift. What I’ve been seeing starts to change. For instance, here I see some dogs entering the flow. I may see a saber-toothed tiger. They also have whiskers, four paws, and fur, but they’re not cats.
These new data elements look like anomalies, and I still don’t know if they’re good or bad, but I know that they earn the distinction of needing further examination.
This is exactly how AI would look at network activity. If I look at the normal flow of traffic to and from a device – for instance, a WiFi-enabled printer – the AI would recognize that the vast majority of normal traffic is inbound to the IoT device manager.
Under a malicious scenario, such as the Mirai botnet attack, my unsupervised machine learning is able to surface that a disproportionate amount of the traffic is moving in different directions and moving to completely different devices.
3. Loki Bot
Loki Bot is a Trojan credential stealer. It’s most famous, perhaps, for installing on Android machines. Most notably, in March of 2017, we saw Loki Bot preloaded with the standard Android operating system in a large number of telephone and tablet shipments. It beacons keylogs that it’s able to capture from user activity, whether that be on your phone, your Android device, or your PC keyboard.
This type of attack has been prevented by using biometrics and dual factor authentication, but not everybody has that for every application or device. Fortunately, AI is the perfect technology to defend against Loki Bot – not by using supervised or unsupervised ML, but by using both of them.
Think of an atomic collider, where we smash atoms into each other at high velocity. When scientists do that, they’re looking for the spray pattern of the atoms after they’ve been smashed. They want to take a lump of matter and bring it down to its elemental components.
What we see is the telltale sign of certain atomic elements. In addition, sometimes you see little particles spin off – the unexpected anomalies. Those might be the bosons or the quarks that are splitting off of the atom. Undiscovered elements that weren’t supposed to be there. Supervised ML detects what’s expected, and you need unsupervised ML to detect the unexpected.
Now, how does that work in the case of Loki Bot? It’s actually pretty simple. Unsupervised ML that has created a baseline of what’s normal is able to see anomalous command and control traffic. Command and control traffic happens all the time, but this is anomalous command and control traffic.
That would be that particle, that quark that’s spinning off into space. Then we also see anomalous behavior based on how the AI has been trained, the supervised ML. For instance, AI can detect a similarity to a known malicious object.
That means I’ve seen it before. This is an element. This is a code segment that I have seen before given that criminals are very efficient and like to reuse segments of code that have worked well in the past. Software can figuratively smash a file or smash elements of lateral traffic in a network and use supervised ML to look for components, including code segments, that may be malicious, that may have been reused from somewhere else.
It’s not enough to do one or the other. You need to do both. And you can detect Loki Bot every time if you do.
This is one I personally fell victim to. DMSniff is a Point of Sale (POS) installer. It literally installs onto a POS device that you use to swipe your credit card. Most interestingly, I think, is that this one was undiscovered for more than four years and during that time stole two million credit card numbers, which is a big haul in the world of cyberattacks.
To detect DMSniff using AI requires a layered approach using deep learning and supervised machine learning.
Think about the TV show “Cold Case.” There are all these unsolved cases, and in many cases, detectives have DNA samples that they collected from the crime scene.
DNA is a great way to identify an attacker IF we have a previously collected sample against which to compare it. But typically we’re not able to associate DNA collected at a crime scene with a known sample that identifies the perpetrator.
Then something interesting happened a couple of years ago – the availability of home DNA testing kits like 23andMe. Millions of people have done this, expanding the set of samples against which crime scene DNA is compared.
Leveraging AI, particularly deep learning, I am able to determine, with a high degree of confidence, that the crime scene DNA may not be a perfect match, but it’s a very close match to a known sample. A relative.
Hackers are highly collaborative. They share routines that they know are effective. Those have telltale signals. Much like people share brown eyes or other physical attributes, so hackers share elements of code. The obfuscated command and control traffic in DMSniff, the beaconing, is essentially the carrier, where the keylogging is going to be where the credit card information is contained.
That would be the element that I would see repeated over and over and over again. Even though it’s not an exact match, it’s got those same brown eyes.
Furthermore, supervised machine learning sees abusive domain generation behaviors and low bandwidth data exfiltration in the code and knows these are bad behaviors. I take those bad behaviors in combination with my deep learning capability, and I’m able to make a determination that this in fact is DMSniff. I can step in and remediate it.
Ultimately, it’s about pulling together all these different types of AI to come up with a composite of what’s actually happening in the network and understanding whether or not an attack is underway.
All of the hype that’s out there has given, in some cases, AI a bad name. People are saying, “Please don’t talk to me about AI.” I think AI is exactly what we need to be talking about.
Given the challenges I described earlier, the volume of attacks, and these sophisticated attacks, AI is definitely part of the answer. And it’s important to understand that AI is not all the same; that different types of AI help with different types of security, different types of data, and different outcomes or detections. Ultimately, leveraging artificial intelligence is going to help us regain control and get back to doing things that are more productive for our companies.
Latest posts by John DiLullo (see all)
- Lastline to be Acquired by VMware - June 4, 2020
- Lastline Donating File and Artifact Analysis Service - March 25, 2020
- Hiding in Plain Sight: Threats You’ll Miss Without AI - January 27, 2020