From Russia(?) with Code
The Olympic Destroyer cyberattack is a very recent and notable attack by sophisticated threat actors against a globally renowned 2-week sporting event that takes place once every four years in a different part of the world. Successfully attacking the Winter Olympics requires motivation, planning, resources and time.
Cyberattack campaigns are often a reflection of real world tensions and provide insight into the possible suspects in the attack. Much has been written about the perpetrators behind Olympic Destroyer emanating from either North Korea or Russia. Both have motivations. North Korea would like to embarrass its sibling South Korea, the holders of the 23rd Winter Olympics. Russia could be seeking revenge for the IOC ban on their team. And Russia has precedence, having previously been blamed for attacks on other sporting organizations, such as the intrusion at the World Anti Doping Agency that was targeted via a stolen International Olympic Committee account.
But attribution is not just hard, it’s often a wilderness of mirrors and, more often than not, a bit anticlimactic.
The motivation of our following analysis is not to point the finger of blame about who did the attacking, but to utilize our expertise in analyzing malware code and understanding the behaviors it exhibits to highlight the heritage, evolution and commonalities we found in the code of the Olympic Destroyer malware.
Initial Samples of Code Reuse
Besides analyzing the behavior of a sample, our sandbox performs several levels of code analysis, eventually extracting all code components, regardless if they are run at run-time or not. As we described in a blog post a few years ago, this technique is essential if we are to detect any dormant functionality that might be present within the sample.
After decomposing the code components in normalized basic blocks, the sandbox computes smart code hashes that are stored and indexed in our threat intelligence knowledge base. Over the last 3 years we have been collecting code hashes for millions of files, so when we want to hunt for other samples related to the same actor, we are able to query our backend for any other binaries that have been reusing significant amounts of code.
The rationale being that actors usually build up their code base over time, and reuse it over and over again across different campaigns. Code surely might evolve, but some components are bound to remain the same. This is the intuition that drove our investigation on Olympic Destroyer further. The first results were obviously some variants of the Olympic Destroyer binaries which we have already mentioned in our previous post. However, it quickly got way more interesting.
A very specific code hash led us through this process: 7CE26E95118044757D3C7A97CF9D240A (Lastline customers can use it to query our Global Threat Intelligence Network). This rare code hash surprisingly linked 21ca710ed3bc536bd5394f0bff6d6140809156cf, a payload of the Olympic Destroyer campaign, with some other samples of a remote access trojan, “TVSpy.” Though the actual internal name of the threat is TVRAT, the malware is known and labelled in VirusTotal as Trojan.Pavica or Trojan.Mezzo, none of which were previously connected to the original Olympic Destroyer campaign.
Figure 1 shows the actual code referenced by the code hash: it is a function used to read a buffer, and subsequently parse PE header from it.
This is not where code re-usage ends, as the actual function referencing and invoking the following fragment (see Figure 2) also shares almost all of the same logic. This function is responsible for loading PE file from the memory buffer and executing an entry point.
A Deeper Dive Based on Unusual Code
We decided to further investigate this piece of code since loading PE from memory is not all that common. Its origin opened several questions:
- Why is that piece of code the only link between the two samples?
- Were there any other samples sharing the same code?
Our first discovery was a Remote Access trojan called TVSpy, mentioned above. This family has been the subject of a few previous research investigations, and a recent Benkow Lab blog post (from November 2017) even reported that the source code was available on github.
Unfortunately, all links to github are now dead. But that didn’t stop us from finding the actual source code (or at least evidence that it was indeed published at some point). Apparently it was sold for $US500 on an underground Russian forum in 2015. Even though the original post and links are gone, a Russian information security forum kept a copy of the source code package alongside a description of the original sale announcement (see Figure 3).
Not Enough – The Investigation Continued
Although interesting, this connection was eventually not enough to connect Olympic Destroyer to Russia or to TVSpy. So we kept digging. Further research finally identified the code in Figures 1 and 2 to be part of an open source project called LoadDLL (see Figure 4) and available on codeproject.com (first published back in March 2014).
However, a couple things still didn’t add up: why had we only managed to identify samples from 2017 even if the source code was released in 2014? What about older versions of TVSpy? How come our search didn’t return any of those samples? Were Olympic Destroyer and TVSpy samples from 2017 sharing more than just the LoadDLL code?
Apparently TVSpy went through a few transformations. Samples from 2015 did embed and use the LoadDLL code, but the compiler did some specific optimizations that made the code unique (see Figure 5). In particular the compiler optimized out both “flags” (not used in the function) and “read_proc” (statically link function) from the parameters of LoadDll, but it couldn’t optimize out a “if (read_proc)” check even though it is useless since “read_proc” is not passed as a parameter anymore.
The “read_proc” function itself is also identical to one from source code (see Figures 6 and 7) and as you can see in Figure 8, it also gets called exactly the same way as the original source code from codeproject.com.
The most interesting aspect for us is in fact the version of TVSpy that dates back to 2017-2018 and shares with Olympic Destroyer almost the exact binary code of LoadDLL. You can see LoadDll_LoadHeaders for those samples in Figure 9: as you might notice the function looks different then the one from the older version (see Figure 8).
First, we thought that the authors added new checks before calling read_proc function, making clear link between Olympic Destroyer and TVSpy (how, after all, could there be the same code modifications if the authors were not the same?). However, after further review we figured that read_proc didn’t exist anymore. Instead it was compiled inline resulting in a statically linked memcpy function.
Also the meaningless check in LoadDll (“if (read_proc)”) we mentioned before has disappeared in the new version of the code (see Figure 10).
The Bottom Line – Evidence is Inconclusive
In conclusion, we believe that this is not enough evidence to substantiate a claim that Olympic Destroyer and new versions of TVSpy using the same modified source code are built by the same author.
The more probable version for us is that the sample was built on a new compiler that further optimized the code. It would still mean that both new version of TVSpy and Olympic Destroyer are built using the same toolchain configured in the very same way (to enable full optimization and link C++ runtime statically). We actually went to the extent of compiling the LoadDLL on MS Visual Studio 2017 with C++ runtime statically linked, and we managed to get the very same code as the one included in both Olympic Destroyer and TVSpy.
Although we would have liked to finally solve the dilemma, and unveil which were the actors behind the Olympic Destroyer attack, we ended up with more questions than answers, but admittedly, that’s what research sometimes is about.
First, why would the authors of an allegedly state sponsored malware use an old LoadDLL project from an open source project from 2014? It is hard to believe that they could not come up with their own implementation or use much more advanced open-source projects for that, and definitely not relying on an educational prototype buried way beyond the first page of results in Google.
Or maybe the actors were not that much advanced as we would like to think, maybe seeing this as a one-time job, without enough resources to avoid using publicly available source code to quickly build their malware? Or maybe it’s just another red flag, and the real authors decided to use the TVSpy source code as released in 2015 to leave a “Russian fingerprint”?
Maybe all of the above?
At the beginning of this article we stated that attribution is not just hard, it’s often a wilderness of mirrors and more often than not, a bit anticlimactic. As a matter of fact, that was quite a precise prediction.