I Hash You: A Simple But Effective Trick to Evade Dynamic Analysis
Apparently “Every new day is new evasion trick day” is a valid motto for many malware authors nowadays. The last sample we are adding to our collection is a banking malware that tries to evade analysis by carefully checking its own filename. While our backend strictly preserves the original name, knowing the tricks employed by this malware might be essential while threat hunting or during some IR investigation.
The sample in question sha1: 4793f245ee6f04f836071528f9f66d3a9a678341 is a variant of the highly evasive banker called Gootkit and will be the subject of this blog post.
Delivery Vector – Social Engineering
Based on our internal telemetry data, the main delivery vector for this malware are document files, mainly macro-based downloaders and documents with embedded PE files. No known exploits are involved in this attack. Below you can see the screenshots (Figure 1 and 2, click on each to enlarge) showing two examples of malicious documents along with the usual lure encouraging victims to either enable macro code execution or double-click the embedded executable file.
Evasion Tricks – Checking Own Filename
Once the user falls for either suggestion, the malware starts executing. The first evasion trick is done by checking its own filename: to retrieve it, the malware calls the PathFindFileName function. It then computes the hash of the filename by using the following algorithm:
The constant highlighted in Figure 3 quickly gives away that we are dealing with a standard CRC32 hashing algorithm with no table lookup (for more details refer to crc32b here). The hash so-computed is then matched against the entries of a blacklist (see Table 1). This is clearly to frustrate analysts: as there is no actual string comparison, it is not possible to easily extract the actual entries. The only option is to re-implement the hash algorithm and either brute-force it or rely on a dictionary.
Table 1. Blacklisted CRC32 hashes and their input strings (not really unique as you can see here for example)
We tried both approaches and managed to recover the original filenames. This helped us to get an idea of the analysis systems targeted by this evasion trick. As it turns out, not only sandboxes are in the crosshair of this sample, but also manual analysis sessions: renaming the analyzed file is quite a common practice among researchers in training who just started working with instrumented environments. To them we say: filenames matter, and should at least be randomized.
Know that this behavior is not unique. In fact, we found several samples in our intelligence backend that make use of similar names as a part of their anti-analysis tricks (see Figure 4).
Unfortunately, none of them helped us with figuring out what was the last blacklisted entry. Brute forcing only produced a bunch of collisions DNYYQXU.EXE and BLRARSHA.EXE and they are both just too random to be the actual blacklisted file names.
Analysts might be tempted to think that randomizing file names would do the trick. That would normally work, but remember that you would still be exposed to the tiny chance of guessing a collision (for example MYAPP.EXE and BBNURKU.EXE have the same hash 0xE8CBAB78 this is hardly common for hash algorithms, but in case of error-detecting codes, it happens a tiny bit more often.
Another common approach is to rename a file to its md5 or sha1 value (such as 02d41d2a7b50b7ee561eef220a7b57df.exe. This is definitely a bad practice, and much worse than a fully randomized name since there is nothing stopping the malware from doing the very same computation and simply evading analysis in case of a match. Interestingly, this is also the default filename when downloading artifacts from platforms like VirusTotal, isn’t it?
Going back to the Gootkit sample, in our case, the malware is a bit more aggressive and just terminates execution if the file name is at least 32 digits long (which is incidentally the length of the md5 hash algorithms). For this reason, to our customers manually submitting artifacts, our recommendation is to always avoid long (maybe hash-based) file names, as this has the potential to hinder a correct dynamic analysis.
The best way to mitigate these evasion attempts is to rely on the origin of the sample, and if possible, retrieve the original filename. We understand it might not be always that easy if done manually. For example, the file name might originate from a malicious downloader or from an embedded resource.
In our sandbox, we automate this step and we monitor the real network traffic and execute the artifacts as they are downloaded, meaning that we always preserve the original name (see Figure 5).
Sometimes the simplest tricks are the best, and can even frustrate analysts (imagine if the sample was executing only if the original name was preserved, for example). In this article, we went through some simple examples and explained how to avoid the most common pitfalls. In conclusion just be careful when submitting a file manually to a dynamic analysis system, and if in doubt, let the system unpack or download the artifact you want to analyze.
Latest posts by Alexander Sevtsov (see all)
- I Hash You: A Simple But Effective Trick to Evade Dynamic Analysis - April 10, 2018
- Olympic Destroyer: A new Candidate in South Korea - February 21, 2018
- Smoke Loader Campaign: When Defense Becomes a Numbers Game - February 1, 2018