Machine Learning Is Transforming Malware Detection
Machine Learning is an important component in detecting advanced malware, but to be effective it must be well-grounded with known threat intelligence.
Dr. Giovanni Vigna, Co-founder and CTO of Lastline, presented his thoughts regarding advanced malware protection at this year’s RSA conference in San Francisco. He spoke about how machine learning is transforming malware detection, especially when it is well-grounded with known threat intelligence.
During his presentation, Dr. Vigna outlined malware detection’s evolution from technologies based on models of known bad malware, to more recent methodologies that model good network behavior.
Technologies that use signatures or models of previously recognized malware are quite effective in detecting well-known malicious objects. However, they have two serious shortcomings. First, they require an enormous amount of time from skilled security experts to develop, and second, they can’t detect new strains of malware. To address these inadequacies, there’s a new breed of malware detection tools based on modeling good or normal network behavior.
The idea is, that by developing models of normal network behavior, it will be easy to spot abnormal behaviors. The assumption is that normal events are good, therefore abnormal events must be bad, or at least suspicious.
Machine Learning Can Spot Deviations
Machine learning and artificial intelligence are very good at processing massive amounts of data and building models of normal behavior. Machine learning can also spot any deviations with relative ease. In theory, this solves two major inadequacies of tools that rely on models of previously known malware. First, it’s no longer necessary to develop signatures or complex models of existing malicious objects—thereby liberating precious security staff for other tasks. Secondly, the technology also detects new malware strains.
Good Behavior Versus Bad Behavior
On the surface, it appears that modeling normal network behavior is the holy grail of malware detection. Unfortunately, it’s not that simple. Normal network events are not always good, and abnormal events are not always bad. For example, the first time an employee works at two o’clock in the morning would be an anomaly, but it’s not necessarily a threat. It might simply be a legitimate employee working in the middle of the night for the first time in their career. Likewise, a normal looking data transfer might actually be insider theft. The basic assumptions that normal is good and abnormal is bad don’t hold up, at least not all the time.
So, unless enhanced, a machine learning tool (by its very nature) will generate a lot of false positives or negatives. This will flood the security staff with events to investigate. Adjustments can generate fewer alerts, but that comes at the risk of missing actual malicious events.
A Well-grounded System Is Crucial
To address these issues, machine learning algorithms must be well-grounded. Dr. Vigna defined this concept as machine learning algorithms that detect abnormal behaviors, but also utilize models that understand known threats—thereby refining actions or alerts. In a well-grounded system, an anomaly detection engine identifies all unusual behaviors, where unusual behaviors are further tested against known malicious characteristics and behaviors. A large file transfer at two o’clock in the morning might be an anomaly, but it’s probably benign unless it’s also associated with something that’s known to be malicious.
To be effective at detecting advanced malware, anomalous events require further tests for associations with known malicious entities or capabilities like:
- Known compromised hosts
- Known malicious IP addresses or geographic locations
- Anonymizing networks
- Unusual encryption capabilities
- Known C&C (command and control) systems
- Other known adversaries
- Unauthorized services or other processes
In summary, Dr. Vigna made a compelling presentation. Machine learning and artificial intelligence are adding exciting new malware detection capabilities. However, unless these tools are well-grounded with intelligence about known threats and the way they operate, they are not effective.