A Scientist’s Outlook for 2019 – From Demystifying AI to Weaponizing Machine Learning
Let me start by saying that I’m a scientist. Therefore, making predictions about what will happen in 2019 is difficult to present in a scientific manner. Predictions, of course, come with a substantial level of uncertainty.
Let me also say that these predictions reflect what I am involved in – what my company, my research, or my hacking activity focuses on. They are not offered as a comprehensive list that considers all aspects of cybersecurity.
And with that, let’s dive right in.
1. AI Will be Demystified
You probably know the scene in “The Wizard of Oz” when Toto, Dorothy’s dog, pulls back the curtain and shows that the very scary and mysterious Wizard of Oz actually is just a human being behind a machine using special effects. In the year ahead, we will see a lot of focus on demystifying artificial intelligence.
In the past few years, there has been some abuse of the term “artificial intelligence,” essentially drawing a curtain around AI – the analogy could not be more pertinent – in order to hide the internals of how certain security solutions detect threats.
Oftentimes you hear, “We use mathematical models,” or, “We have an intelligent system that is able to substitute the intelligence of an analyst.” All these statements, in my opinion, are a bit extreme. The reality is that while AI can be a powerful technology, today it can only be effective as a support for human decision-making. We’re years away from a truly stand-alone AI system that can learn and operate on its own.
Artificial intelligence is a very big field. Within AI we have subgroups of technologies. What made AI so well-known and popular is machine learning, a subset of AI; a particular way of implementing AI, but not the only way. Within machine learning, there is a set of technologies called deep learning, which uses specific types of techniques based on neural networks in order to perform a number of difficult tasks.
In machine learning, you usually have two main approaches, called supervised machine learning, which involves some human guidance, and unsupervised machine learning, in which a machine is essentially left to figure it out for itself.
Of course, one important aspect of AI is that what you learn depends on the data that you learn from. If you have bad data, the system will learn the wrong thing even though the algorithm could be very sophisticated.
I expect that in the coming year, organizations will be more selective, and will want to be more informed about the AI technologies they implement. They will want to demystify AI, to look behind the curtain. They will not be satisfied when vendors say, “Oh, we use machine learning. We use AI. Trust us. It’s working.”
They will ask the right questions, such as, “What specific approaches are you using?” Even within a certain approach – machine learning, deep learning, supervised, unsupervised – “What kinds of algorithms are you using? What data are you using for training?”
This will be great for the community because once you pull back the curtain and you do not allow vendors to generically point to artificial intelligence as the silver bullet to solve every security problem, you will be able to actually compare different approaches. As a result, users will have realistic expectations for what AI and AI-based systems can do.
You can say vendor X is actually using this particular algorithm that happens to have great results when it is fed a lot of data, but it requires a lot of learning time. This other algorithm, on the other hand, can get to good results in a short time, but in the long run is more prone to false positives and false negatives. I predict that customers will become more educated, more sensitive towards the nuances that are behind the curtains of AI.
2. Security Will Increase the Focus on East-West Traffic
Typically, security solutions focus on traffic from inside the network to the outside, often referred to as “North-South traffic.” North-South solutions include web application firewalls, network sandboxes, intrusion detection systems, and malware detection systems. They sit between the internal network and the outside world and observe the interaction between the two.
However, there’s so much more information that can be gained by looking at how information is exchanged within a network and understanding the patterns of internal network behavior – the East-West traffic.
For example, there could be a subnetwork in which users access a database service elsewhere on the internal network. If a security solution is only observing the North-South traffic, it will not be able to capture these important relationships.
These relationships are important because advanced threats are bypassing North-South defenses, through IoT devices, phishing attacks, personal devices compromised off site, and more. A new group of Network Traffic Analysis (NTA) systems uses machine learning to model how a network is used, how services relate to each other, the type and quantity of information that is exchanged, and when services are accessed.
All this internal data can be used to create a baseline of normal behavior and activity, and then to identify outliers that could be evidence of an attack. However, this is only possible if NTA tools have access to the East-West traffic.
3. Artificial Intelligence Will be Weaponized
The moment the bad guys realize that the good guys are using machine learning and AI to detect outliers, their immediate reaction is, “How can I break this? How can I escape detection? How can I make my tasks look normal so they are not detected as outliers? What can I change in my attacks so this automated classifier will think that this is a benign sample?”
We’re starting to see a more systematic approach in which the bad guys are creating mutation engines that explore how a detection system reacts to various outliers in order to find the sweet spot where a certain element is misclassified or considered normal and not an outlier by the AI-based system.
This is a very difficult problem. It’s difficult because many of the techniques and algorithms that have been developed for machine learning have been developed in environments like image recognition, natural language processing, and vision where the data being analyzed was not actively resisting classification.
When you’re classifying pictures of animals, you don’t have somebody putting into your data set misclassified samples on purpose to ruin your classification process. However, when an AI system is learning from a security data set, attackers are providing a large portion of the data. These are attacks, malware samples, various threats. They are controlled by the attacker, and therefore, can influence your system’s learning. You cannot apply verbatim the techniques that have been developed for natural language processing and image recognition to security.
In the scientific world, we call this adversarial machine learning: What your system is trying to learn is actually resisting characterization. For example, an attacker might pollute your data set. An attacker might even steal the models that you have learned, so they can analyze them offline and find a hole that allows them to produce samples that will escape the correct characterization.
Imagine, for example, somebody breaking into a network. But instead of immediately starting a very loud and easy-to-detect port scan, they simply listen to the traffic. The attacker then builds a model of what’s normal in the network and creates attacks that are conformant to that model.
It could be as simple as, “Hey, I realized that people here are using the network mostly between 8:00 am and 5:00 pm. Guess what? I’m going to make all my traffic appear in this timeframe. I’m not going to start downloading a large data set at 3:00 am.”
4. Criminals Will Target Cloud Workloads
As there is a continuous movement from the on-premises architectures and solutions to cloud-based solutions, there will also be a transition from attacks against internal network to attacks against the cloud.
We have seen sophisticated attacks break into servers running in the cloud. One technique is built on the fact that cloud-based solutions utilize tiered APIs to have a very open system. A misconfigured cloud workload allows for too much access to the API services, creating access to information that shouldn’t be accessed directly, in turn creating denial service attack or remote compromise opportunities.
Another problem is that oftentimes on-premises data center style approaches are transferred to the cloud, which doesn’t always work. For example, there could be a VPN server that is supposed to be accessed only from a protected data center. When it becomes cloudified, it becomes globally accessible. Therefore, spilling the credential to that VPN server allows full access to the entire cloud system.
Imagine an attacker that not only breaks into servers but is able to break into the console of a cloud management system. Now the attacker is able to fire up hundreds and hundreds of servers that start mining cryptocurrency, for example. By doing that, it will use an enormous amount of resources before anybody realizes that they have 50,000 additional servers running on the cloud.
Not only do they get to run their Monero mining loads or tasks, but they also inflict a huge penalty in terms of cloud time to the victim. This is going to be a problem. We will need new security solutions. In particular, we will need more cloud network visibility.
We will need the ability to see what’s happening in an organization’s cloud workflows. Not only in terms of what workloads are instantiated on the network but also how they communicate with each other. Is the communication pattern conformant to what the architecture of the system is supposed to implement?
This requires either an agent on the host or a dedicated instance to collect this traffic. There will be a lot of focus on security vendors to develop novel solutions that are able to provide the necessary network visibility in the cloud, without killing the performance of the cloud workloads.
5. Attacks Against the IoT Device Universe Will Become More Sophisticated
Another huge problem of 2019 will be caused by IoT devices, which already have been attacked extensively. However, what we see, especially from the research point of view, is a new interest in using novel techniques to analyze the firmware of complex systems.
If you look at the big IoT compromises, so far they have been very simple attacks. The Mirai botnet just used default account credentials on a bunch of IP-enabled devices.
Today we are seeing completely novel techniques that look at the firmware running on IoT devices. Initially, we have seen attacks that used traditional forms of input…like sending a message to a smart light bulb to turn it on. That’s an intended input, and it’s probably handled well, most of the times. We are now starting to see techniques that look at how sensors, information from the hardware itself, can be used to interact with the internals of an IoT device, and bring that IoT device to an unsafe state.
An example would be compromising the WiFi chip or the radio chip on a specific IoT device and using the radio chip to go back inside the IoT device firmware and take control of it. Another thing that we see happening is a more sophisticated exploration of the interaction among multiple devices and among multiple processes on a single device.
A lot of the attacks that we have seen so far on IoT device are fundamentally based on input/output relationship. I send this input, I receive this output. Maybe I find a vulnerability, maybe not. Now, we see the ability to extract the firmware from IoT devices, decouple the firmware from the hardware so that an attacker can run many instances of the same firmware without having to buy many instances of the hardware, and then launch large scale sophisticated attacks to exploit very deep vulnerabilities that were uncovered.
Typically, these devices are difficult, if not impossible, to upgrade, and therefore compromising IoT devices will become a bridge into a protected network.
6. Data From Personal Health Devices Will Be Compromised
We have so many devices nowadays that collect information about who we are and what we do. Our phones have accelerometers. We wear Fitbits. We wear smartwatches.
All this information gets saved because we want to see how many steps we take, what is our heart rate when we sleep and when we don’t sleep. I predict that in 2019 we’ll start to see reports of this data being compromised. Of course, stealing how many steps a person took throughout a year might not look like something to be squeamish about. My concern is that these health data breaches will end up being coalesced in very large databases of personal health information.
Criminals will consolidate information about what you do, your DNA, and more. If they can gather enough information, they can, for example, provide complete health projections on single individuals to third parties like insurance companies.
This would be an enormous violation of privacy. Knowing that somebody, because he has a certain pattern of walking over the course of a year, is 50 percent more likely to have a certain disease could be used by organizations to filter or price services.
I am very concerned about the coalescing of data from many breaches. We have seen this already, but not with health data. Recently, a mega data set was uncovered that is thought to be the coalescing, the grouping, of many data breaches that exposed usernames, passwords, email credentials, credit card data, and more.
I think that if this happens for health data, we will have a new category of problems on our hand, and we’re not well-equipped yet to deal with it.
In closing, I predict that 2019 will be a fun year. Because for us researchers, learning is fun. And we’ll learn a lot about new attack techniques and resulting defense measures, and which of these predictions ultimately play out or end up as the rambling of a university professor and a technologist.