Uncovering Nation-Specific, Targeted Attacks ( . . . without Knowing Korean)

Uncovering Nation-Specific, Targeted Attacks ( . . . without Knowing Korean)

Authored by: Alexander Sevtsov

Phishing emails with suspicious attachments are one of the most popular malware delivery vectors today. These attachments usually consist of script-based downloaders (WSF, JS, HTA), or, more commonly, Microsoft Office files with embedded macro code inside. Social engineering methods are used to 1) gain the victim’s trust regarding both the sender and the attachment, 2) entice them into opening the file, and 3) cause them to enable macro code execution, which is disabled by default but essential for the attack to proceed.

However, there is a problem for the attacker: if the user refuses to enable macros the attack will be stopped. In response, cybercriminals often use exploits (and document exploits, in particular) because they perfectly solve this problem. No user interaction (except for opening the attachment) is needed to instantiate the infection. These type of exploits are usually achieved by crafting malformed files that, when opened in a vulnerable version of an application (such as an out of date browser, video player, or document viewer), trigger malicious activity.

Document Exploit Techniques

Security analysts and researchers generally use two different techniques to detect and mitigate document exploits. The first method, static analysis, finds anomalies in the document’s structure and file format. The second approach, dynamic analysis, executes the file in an instrumented, virtualized environment, or sandbox, to intercept the malicious functionality.

Static approaches are somewhat limited because they can only detect previously known anomalies. Dynamic analysis, on the other hand, requires the system to have a specific version of the vulnerable software installed. This is a problem because it’s virtually impossible for a sandbox to install every possible version of every application.

Fortunately, by combining both static and dynamic analysis, along with deep content inspection and advanced analytics, security analysts can overcome the above-mentioned weaknesses to detect and thwart document exploits.

In today’s blog post, we are going to dive deep into a malicious Hangul Word Processor (HWP) document file, and demonstrate how we use advanced technologies to detect encryption, evasion, code injection, and other malicious capabilities. HWP documents are widely used in South Korea, including the government, and for today’s analysis, we pulled an interesting sample out of our database of malware intelligence.

HWP Documents

There are a couple of reasons why this particular type of file is interesting and caught our attention: first of all, although not nearly as popular as MS Office documents, HWP files are used in targeted attacks in a very specific country. Secondly, starting from HWP Document File Format version 5.0, most streams in HWP documents are zlib-compressed, which hinders the static detection of embedded PEs, malicious JavaScript files, or shellcode. Moreover, such raw streams don’t have any known file headers that can be effectively used to find a suitable decoding/decompression algorithm to unpack them generically.

Figure 1 HWP document opened in HWP Viewer

Figure 1. HWP document opened in HWP Viewer

During the remainder of this post, we will dissect our sample HWP document exploit step by step. We’ll begin with a shellcode analysis, using both static and dynamic techniques.

Static Shellcode Analysis

MD5 hash: 696c4403fe65089f76361860849d26c4
VT coverage as of 2017/08/25: 10/58

Let’s use Far Manager to begin our inspection.

Figure 2 HWP document content opened in Far Manager

Figure 2. HWP document content opened in Far Manager

Immediately after viewing the file we see that it contains an embedded Encapsulated PostScript (EPS) file, which automatically triggered a notification alert via Lastline’s Static Document Analysis.

Figure 3. Document Structure Tab of Lastline’s Static Document Analysis

Figure 3. Document Structure Tab of Lastline’s Static Document Analysis

In most cases, EPS is simply one of the many formats in which an image can be embedded in a document. Technically, however, it is a powerful, stack-based programming language with variables, operators, loops, conditions, and procedures for creating vector graphics. Given this expressiveness, cybercriminals have recently started using it for document exploits.

It’s worth mentioning that the mere presence of an embedded EPS file is a sign that something suspicious is going on: embedding an EPS file is a common tactic used by hackers in Microsoft Office exploits – we usually don’t see them in benign document files. As a matter of fact, its use is so uncommon and so potentially dangerous, that in April 2017 Microsoft turned off the Encapsulated PostScript (EPS) Filter in Microsoft Office by default as a defense-in-depth measure.

As a next step, let’s extract the embedded EPS file located in the BinData subdirectory and rename it “compressed.eps.” (another possible way of dumping the stream is to use oledump script).

Figure 4. Extracted zlib-compressed EPS file content

Figure 4. Extracted zlib-compressed EPS file content

Now, we will decompress the stream (using with the zlib.decompress method in python) and try to analyse it statically. We will additionally convert the long hexadecimal string (that, allegedly, represents some assembly code) to binary (by calling the binascii.unhexlify method in python).

The first thing that catches our attention is the repetition of the bytes (0xB2 0xB2), the opcode of processor instruction MOV DL, 0xB2, which does nothing but move the value to the register a few hundred times. This is a classic NOP sled, a convention commonly used by exploit authors to “slide” the CPU’s execution flow to its final desired destination whenever the program branches to a memory address anywhere on the slide. Another telltale sign that this is a malicious file.

Immediately after this long sequence, we see another NOP operation (0x90), followed by the beginning of the actual shellcode:

Figure 5. Binary data of decompressed EPS showing NOP sleds and shellcode

Figure 5. Binary data of decompressed EPS showing NOP sleds and shellcode

As mentioned above, EPS is actually a scripting language. The attacker can use this to their advantage: at the very end of the EPS file, we can find the following code:

Figure 6. EPS script (Heap Spraying)

Figure 6. EPS script (Heap Spraying)

This loop is responsible for filling the memory of the Hangul process with 1024 copies of NOP sleds and shellcode (stored in D40 variable). This technique, known as Heap Spraying, is used to increase the chance of successful shellcode execution by copying the shellcode into many memory locations, increasing the chance of executing one copy of the shellcode and making a vulnerability easier to exploit.

Since shellcode is generally position independent code, it must locate itself in memory, or retrieve the current instruction pointer somehow. This value is stored in the EIP register and cannot be accessed directly by software. One possible way to find the current address in memory is to call CALL/POP (or JMP/CALL) assembly instructions:

Figure 7. Finding EIP in memory

Figure 7. Finding EIP in memory

After establishing its position in memory, the shellcode begins to decrypt the next portion of its code by using an XOR algorithm and a hardcoded key, as shown in the snippet below:

Figure 8. Beginning of the shellcode

Figure 8. Beginning of the shellcode

Next, the now-decrypted shellcode is responsible for finding the base address of kernel32.dll in memory by parsing the Process Environment Block (PEB) and decrypting the final part of the shellcode (which will be later injected into a legitimate process):

Figure 9. Second stage shellcode analysis (finding the kernel32.dll base address and second decryptor)

Figure 9. Second stage shellcode analysis
(finding the kernel32.dll base address and second decryptor)

In order to look up necessary functions, shellcode usually compares hashes of API functions (ROR-13 in this case) instead of names to complicate manual analysis.

Figure 10. hashString function implementation

Figure 10. hashString function implementation

After resolving necessary API function addresses, the shellcode injects code into a remote process. To do this, it first spawns a copy of cmd.exe by calling the CreateProcessA function. Then it allocates memory in the process by calling VirtualAllocEx, and writes shellcode in the allocated memory through WriteProcessMemory. Finally, it creates a remote thread for its execution via CreateRemoteThread:

Figure 11. Remote code injection into cmd.exe

Figure 11. Remote code injection into cmd.exe

Dynamic Shellcode Analysis in the Lastline Sandbox

The most effective way of automating shellcode analysis is to embed it into an executable file and run it. This technique allows a security researcher or incident responder to investigate malicious code in detail without even having a vulnerable application at hand. For this purpose, we convert the shellcode into an EXE as described by @malwareunicorn, and submit the generated file to the Lastline analysis system:

Figure 12. Shellcode analysis overview from Lastline analysis engine

Figure 12. Shellcode analysis overview from Lastline analysis engine

As we can see in the analysis overview, the shellcode injects code into a legitimate process in order to retrieve a binary file (disguised as a JPEG) from a remote server. Additionally, it attempts to evade dynamic analysis in different sandboxes (as we will cover shortly).

Figure 13. Network communication with a C&C

Figure 13. Network communication with a C&C

Following the shellcode’s execution, we can observe that it tries to read the remote file (by calling the InternetReadFile function) from a handle opened via InternetOpenUrl. This is done to keep the binary data in memory without dropping anything to disk, which could raise suspicion from host-based security solutions. Then the shellcode jumps to the code in the retrieved binary (highlighted) to continue execution, where it uses the same pattern for retrieving the EIP in memory as was described above:

Figure 14. Beginning of shellcode in the camouflaged JPG file

Figure 14. Beginning of shellcode in the camouflaged JPG file

Afterwards, the shellcode again resolves essential APIs and decrypts an embedded executable file, and runs it. The decryption routine is implemented by using an XOR algorithm with a random key:

Figure 15. Decrypting an embedded executable file in memory

Figure 15. Decrypting an embedded executable file in memory

The embedded PE file performs a series of anti-evasion tricks, including checking the presence of well-known dynamic analysis systems (such as Sandboxie, Sunbelt, VMware, VirtualBox) by looking for the presence of specific registry keys (SystemBiosVerion, SMBiosData) and loaded DLLs:

Figure 16. Checking loaded DLLs associated with different Sandboxes and VMs

Figure 16. Checking loaded DLLs associated with different Sandboxes and VMs

As the Lastline analysis report showed, the sample fully executed in our sandbox, triggering all of the file’s malicious behavior. Thus, the analysis engine successfully defeated the malware’s evasion attempts, executed the shellcode, and transferred execution to the encrypted payload retrieved from the C&C.

Summary

In this blog post, we have shown a sophisticated attack which ended up executing a malicious PE file through multi-staged shellcode embedded in an Encapsulated PostScript (EPS) file and a camouflaged image file, and how the attack can be analysed by both static and dynamic methods.

As one can see, the attackers exploit not only popular applications such as Microsoft Office but also nation-specific ones to propagate malware. This, of course, complicates the analysis of these kinds of threats and requires more attention from security analysts and malware researchers to provide the best mitigation schemes to confront the attacks.

Alexander Sevtsov

Alexander Sevtsov

Alexander Sevtsov is a Malware Reverse Engineer at Lastline. Prior to joining Lastline, he worked for Kaspersky Lab, Avira and Huawei, focusing on different methods of automatic malware detection. His research interests are modern evasion techniques and deep document analysis.
Alexander Sevtsov