5 Truths about AI – Truth #2: The World is Too Complex for Linear Classifiers
In my first blog post in this series, I described the first truth about using AI for cybersecurity, which is that Anomalies Aren’t (Necessarily) Threats. In that post, I described how unsupervised machine learning, typically used by many Network Threat Analytics (NTA) solutions for anomaly detection, generate both false positives and false negatives. I went on to describe how using supervised machine learning does a better job of correctly distinguishing between benign and malicious anomalies. And to illustrate the point, I discussed linear classifiers as an example. While this made it possible to clearly explain why supervised ML is superior to unsupervised ML for detecting cyber threats, the example was perhaps a bit too simplistic.
The second truth is that the real world (and real networks) are too complex for linear classifiers.
A linear classifier makes classifications based on a linear combination of its input values. Basically, you are looking at a bunch of values, like the number of bytes or the time a connection took, and use them as input features for your classifier. For your model, you give each input feature a certain weight. When you have to classify a specific input, you multiply each feature value by its weight and sum everything up. Then, you compare this result to some kind of threshold. In the equation below, we have the features denoted by x, the weights by w, and the threshold by T.
When you build a linear classifier (that is, when you train the model), you learn the proper weights for each feature, together with the right threshold. Then, when you perform detection, you check if the sum of my weighted features is smaller (or larger) than the threshold. If so, you classify the input as good (or bad). If you do that, if you create such a linear classifier, you’re basically creating a line, or, in a high‑dimension input space, a plane or hyperplane. And you are saying everything on one side of that hyperplane is good, and everything on the other side is bad.
The equation makes it look complicated, but the idea is actually very simple, and, not surprisingly, it is used by a lot of security products. Unfortunately, it’s not a very powerful classifier.
Linear Classifiers Make Too Many Mistakes
To illustrate, let’s look at what happens if you get the red (malicious) and green (benign) data points shown in Figure 1. How would you draw a straight line that separates the red and the green cleanly? Obviously, you can’t. It’s impossible, so there’s no linear classifier for this example that can, with 100 percent accuracy, separate red from green; malicious from benign.
If the only thing you have is a linear classifier, you would have to make the best possible line that makes the least mistakes, but it would still make a bunch of mistakes.
Or you can use a more complex classifier that would allow you to create nonlinear relationships. It could create a decision boundary like the one shown in Figure 2.
This will be something that is much more complex, but this complexity (or power) is needed to create an accurate distinction between what’s good and what’s bad. Some systems achieve this through logistic regression or by using a neural network. All of those more complex machine learning algorithms can capture these nonlinear relationships. That’s very important. The effectiveness of the AI solution is dependent on the sophistication of the algorithms and classifiers; on having the proper algorithm given the nature of the data.
If you just use simple linear classifiers, you will not get very powerful supervised machine learning models. Too many systems basically look at a few features, take an average, maybe look at some standard deviations, and compare it to a threshold. This could be something like “every time a host sends more than three times the amount traffic that it usually does, or if that host connects to more than N machines (where N might be five), send an alert.” It’s very simple. Too simple to be effective. It cannot capture complex network traffic, and there will be numerous false positives and missed threats (false negatives).
When you’re considering network security solutions, know that many that claim they’re using some type of machine learning or AI often have very simple statistical models. Ask for details about how they’re actually using AI.
So, hopefully I’ve convinced you that supervised learning is good, and you need something more sophisticated than a linear classifier. That’s great. And this leads to my next truth about what is needed to make that non-linear classifiers truly work: data. I’ll discuss that in my next post.
Please see the other posts in this series:5 Truths about AI in Cybersecurity – Truth #1: Anomalies Aren’t (Necessarily) Threats 5 Truths about AI in Cybersecurity – Truth #3: Good Training Data Can Be Hard To Get
Coming Soon:5 Truths about AI in Cybersecurity – Truth #4: We Need a Signal in the Data to Train AI 5 Truths about AI in Cybersecurity – Truth #5: AI Can be attacked