Evolution of Excel 4.0 Macro Weaponization

Evolution of Excel 4.0 Macro Weaponization

Abstract

Excel 4.0 (XL4) macros are becoming increasingly popular for attackers, as security vendors struggle to play catchup and detect them properly. This technique provides attackers a simple and reliable method to get a foothold on a target network, as it simply represents an abuse of a legitimate feature of Excel, and does not rely on any vulnerability or exploit. For many organizations, blacklisting isn’t a viable solution, and any signatures to flag these samples must be precise enough not to trigger on files that leverage this feature legitimately. As this is a 30-year-old feature that has only been discovered and exploited en mass by attackers in the last year, many security vendors do not currently have detection mechanisms in place to trigger on these samples, and building reliable signatures for this type of attack is not a small task.

The Lastline Threat Research Group has observed thousands of samples leveraging this technique, and has been monitoring and tracking trends for the last 5+ months. Intercepting these samples has provided valuable data to build statistics, identify trends, find outliers, and track campaigns. We were able to cluster samples into distinct waves, which clearly display how this technique has evolved through time to become more sophisticated and more evasive. As XL4 macros represent somewhat “uncharted territory,” malware authors and security researchers are making new discoveries daily, pushing the boundaries of this technique and identifying ways to evade detection and obfuscate their code. The techniques employed by these attackers include ways to evade automated sandbox analysis and signature-based detection, as well as hands-on analysis performed by malware analysts and reverse engineers. As previously mentioned, these techniques appear to surface in waves, with each new wave introducing new techniques, building on the previous wave or cluster. Techniques used in the first wave of samples we observed in February are still being leveraged in samples being discovered today. In this blog post, we describe each wave and cluster in detail, by breaking down every new technique discovered, and explaining why each is significant, effective, or ineffective.

Executive Summary

Through clustering thousands of samples and performing in-depth code-level analysis of each cluster, we were able to visualize and witness the evolution of this threat. We found that roughly every 1-2 weeks, a new wave of samples emerged, each more evasive and sophisticated than the last. Each of these waves appeared to build on its predecessor, extending its functionality by introducing new techniques on top of what already was being used. The size of these clusters suggest that these samples are being generated with some sort of toolkit or document generator, as these samples resemble one another too closely to not be related. Evasion routines and obfuscation were the primary areas of evolution, as the base functionality of these samples remained the same – download and invoke a more persistent payload, such as an EXE or DLL file.

Introduction

Excel 4.0, or XLM macros, is a 30-year-old feature of Microsoft Excel that has been gaining popularity among malware authors and attackers, especially over the last year (see Chart 1). This type of macro code is actively being abused and weaponized by attackers to deliver additional, more persistent malware. What makes this technique effective is that much like the more popular and up-to-date VBA macros, Excel 4.0 macros are a component of legitimate Excel functionality, thus will likely never be disabled, as they are used regularly for benign business purposes. For example, the commonly used SUM function is used in many spreadsheets to obtain the sum of a range of cells. Macros of this type are commonly referred to as “formulas.”

Chart 1: Six months of attacks detected using XL4 macros.

This technique has been effective because although it is an old feature, security vendors may not have yet devised detection techniques for this type of attack. On top of this, organizations will likely never be able to disable this feature in Excel, due to the fact that it is used regularly for legitimate purposes. For this reason, malware authors now have another reliable method for initial access, as these malicious documents are being successfully delivered via email attachments.

The Problem

Security vendors are having difficulty detecting this threat, likely due to not having solutions in place to properly assess and parse the format and structure of how these macros are stored in Excel documents [2]. These macros are very straightforward and easy to create, thus easy to modify to bypass signature-based detection. Macros are also robust, and provide various functions that can be leveraged to evade analysis, such as obfuscating the final payload, modifying the control flow, or detecting automated sandbox analysis through specific host environmental checks.
This technique will likely remain relevant, and join its successor (i.e., VBA macros) as a widely used technique to weaponize document files. This technique does not rely on a bug, it is not an exploit, but it simply abuses legitimate Excel functionality. These macros can be set to auto-execute, and run as soon as a workbook is opened if macros are enabled. As this is somewhat uncharted territory, malware authors and researchers are still exploring the depths of possibilities and capabilities of weaponizing this attack technique.

Contribution

The Threat Research Group at Lastline has clustered and analyzed thousands of XL4 samples, as we have been tracking and monitoring this threat for over five months. We were able to cluster and classify these samples into clear waves or trends. The bulk of these samples appear to be created with the same toolkit or document generator, as each wave builds on the previous, adding new functionality with each iteration. This additional functionality typically consists of more sophisticated obfuscation or evasion routines. These stage-1 spreadsheet samples are in turn delivering a variety of commodity malware families, such as Danabot, ZLoader, Trickbot, Gozi, and Agent Tesla.

Evolution Timeline

The timeline below (Figure 1) displays how these macros have evolved over the past four months. Each block represents a significant wave or cluster that exhibited new behavior and functionality that we had not yet observed at scale. We will elaborate on each cluster in greater detail in the following sections.

Figure 1: Evolution of Excel 4.0 macros weaponization.

Evolution Breakdown

 

Cluster 1: ~2020.02.14

This set of samples is the first significant wave of weaponized XL4 documents we observed. These samples all contain a hidden macro sheet that holds the payload, and also an image (see Figure 2) that is used to social engineer the user into enabling the macro code. The techniques introduced in this cluster will be leveraged and built upon by almost all the waves that followed.

Figure 2: Embedded image to entice users to enable dynamic content.

 

Unhiding the hidden macrosheet shows the payload (Figure 3), which is not obfuscated in any way.

Figure 3: Payload of the hidden macrosheet.

Evasion Routine:

  1. A sandbox check is performed by requiring user interaction with a message box (Figures 4 and 5). If “OK” is clicked in the message box, the next check in the evasion routine is performed, otherwise the macro will exit.
    CELL FORMULA:
Figure 4: First check in evasion routine.
Figure 5: MessageBox that requires user interaction.

2. Additional sandbox evasions commence, this time using the GET.WORKSPACE function [1] (see Figure 6). This function will provide information about the environment that the Excel spreadsheet is being viewed in. The two checks are for mouse capability (constant of 19), and audio capability (constant of 42). The malware exits if either of these two conditions are not met.

Figure 6: Evasion routine – Mouse and audio capabilities.

3. An additional check for the victim’s OS commences (Figure 7). This is performed through checking for the string ‘Windows’ being present in the return value of the call to GET.WORKSPACE:

Figure 7: Evasion routine – Check for Windows environment.

The Download:

An additional payload – a cell formula – is downloaded via a web query (Figures 8 and 9). These web queries are stored in the DCONN (data connection) record type [3].

Figure 8: Hexdump of DCONN record / Web Query.

Figure 9: Web Query / data connection in Excel.

The data from this network connection will be written to cells mapped to the name fgsb4g (shown at bottom of the hex dump in Figure 8), which the Excel name manager shows are range $Y$100:$Y$103 (Figure 10).

Figure 10: Name Manager showing destination cells.

A small loop is then created between two cells that check whether the payload was downloaded successfully (Figure 11). These cells are checking for the string “LOS” anywhere within cell Y103. This is because at the end of the downloaded cell formula payload, the malware calls =CLOSE().

Figure 11: Checking for successful download

Cluster 2: ~2020.02.26

The second cluster of February added minor obfuscation through hiding the payload by using a white font on white background (shown in red in Figure 12), and by scattering the code around the worksheet. This makes following control flow and identifying important code blocks a bit more challenging for manual analysis, but should cause no significant issues for dynamic analysis, noting that the first cluster included significant attempts to evade dynamic analysis.

Figure 12: Scattered payload (originally payload was written in white font).

Cluster 3: ~2020.02.27

This cluster protects itself more than the previous clusters, as it hides the payload in a veryhidden macro sheet instead of a hidden macro sheet (note that both types of macro sheets are Excel documented features). This means that the macro sheet cannot be unhidden via the traditional method in Excel, but instead a specific flag in the binary (see Figure 14) must be modified in order to unhide and view the contents of the worksheet.

Once unhidden, this macro appears to extend on the dynamic analysis evasion routines from Cluster 1 by performing additional guest OS environmental checks (Figure 13). These two new checks also use the previously mentioned GET.WORKSPACE function, but this time they attempt to identify the height and width of the workspace. This check ensures that the display size of the workspace points (similar to pixels) are greater than the minimum value, 770×380, in this cluster.

Figure 13: Checking the size of the display of the workspace.
Figure 14: Veryhidden flag (0x2).

Cluster 4: ~2020.03.06

The new techniques introduced by this cluster include building a path to three different files, one being a VBS script, which is a technique we didn’t see in previous clusters. These paths (Figure 15) will be used by the downloaded formula, the payload, using the same DCONN web query described previously.

Figure 15: Additional code – Paths to files.

Also, instead of FORMULA.FILL being used to move the downloaded payload around the sheet, FORMULA is used (Figure 16). This minor change is likely an evasion to signature-based detection of the previously used FORMULA function.

Figure 16: FORMULA instead of FORMULA.FILL

Cluster 5: ~2020.03.10

This cluster is the first batch of samples where we observed heavy usage of the CHAR function (Figure 17). This function translates an integer to its corresponding ASCII character. For example: CHAR(0x41) resolves to ‘A’. These characters are resolved, then concatenated one at a time to build the final payload. This is a common obfuscation technique across a variety of file formats and languages, but this is the first time we have seen the technique used in Excel 4.0 macros.

Figure 17: CHAR functions.

Within this same timeframe, we also observed the first large wave of samples that directly invoked the WinAPI (URLDownloadToFile) via the CALL function (Figure 18) instead of the previously mentioned DCONN download method, which is likely used to evade static detection of the prior download technique.

Figure 18: WinAPI usage – URLDownloadToFile.

Cluster 6: ~2020.03.30

The added functionality of this cluster includes a check for a non-default Excel Security setting in the registry as a possible dynamic analysis evasion, and the use of the WinAPI to register a DLL to execute the second stage payload. The code in Figure 19 shows that the native utility reg.exe will be spawned (see Figure 20) to extract a specific registry key, and write it to a registry file on disk (<random_integer>.reg).

Figure 19: Deobfuscated payload – reg.exe.
Figure 20: Process tree.

This registry file (<random_integer>.reg) is then read from bytes 215:470 (see Figure 21). The purpose of this read is to find and store the Excel Security-related registry settings of interest that the malware will check.

Figure 21: Macro reading registry file.

This appears to be a check for the File Validation setting in this instance; Figure 22 shows the hexdump of an example (carved out the 255 bytes).

Figure 22: Hexdump of file validation setting.

The macro then checks to see if the security setting is set, as shown in Figure 23.

Figure 23: Checking if the flag is set.

Cluster 7: ~2020.04.01

This cluster expects to be executed on a particular day, as it uses the current day of the month in a part of the deobfuscation routine. Hardcoded integers (Figure 24) will be subtracted from the current day of the month, and the difference will then be passed to the CHAR function. These samples were created on April 10th, and are expected to be run on that same day. We don’t see much of this “day-of” technique used outside of this cluster. This may be due to the attackers being less successful with this wave, perhaps because the victims checked their email (opening the document) in the days following its intended execution day.

Figure 24: Hard-coded integers to be decoded.

Each hard-coded integer is subtracted from the day of the month plus 7 (Figure 25), then passed to char.

Figure 25: Obtaining the day of the month and adding 7.

Figure 26 shows the part of the deobfuscation routine where the values in column A are decoded and concatenated into one string.

Figure 26: Deobfuscation routine.
 

If run on the incorrect day of the month, the XL4 macro will not deobfuscate and execute properly (Figure 27 shows an example of a failed deobfuscation).

Figure 27: Failed deobfuscation (executed on the wrong day).

Emulating this behavior in Python with the proper value showed the expected deobfuscation (See Figure 28).

Figure 28: Emulating deobfuscation routine implemented in Python.

Hardcoding the correct value (date) in the sheet allows the code to deobfuscate properly, as shown in Figure 29:

Figure 29: Expected deobfuscated payload.

Cluster 8: ~2020.04.14

Like the previous cluster, this cluster must also be executed on a particular day, but a new check is added to query the font size and row height (Figure 30). GET.CELL is used for both of these checks, font size of the text in cell A1 (19), as well as row height (17). This is likely to check whether the spreadsheet has been tampered with.

Figure 30: Obtaining cell font size and row height.

The result of these operations is written to cell AA181, which is used repeatedly during the deobfuscation routine. If these dimensions are not precisely as the malware authors expected, the payload will not be deobfuscated properly, and the second stage download will not occur.

Cluster 9: ~2020.04.25

The new technique introduced in this cluster employs dozens of independent macrosheets (Figure 31), whereas all previous clusters used only one or two sheets. We believe this is intended to have the sample blend in with benign workbooks, which tend to have a higher number of macro sheets than the malicious samples do. It also may be used to slow down malware analysts who will have to identify where the payload lies across the 20+ macro sheets (Figure 32).

Figure 31: Hidden macro sheets
Figure 32: List of all sheets.

Another interesting characteristic is that instead of the typical Auto_Open name for the automatic execution of the payload, the malware author has named it Auto_Open22 (Figure 33), which still executes as a normal Auto_Open routine.

Figure 33: Auto_Open22 in Name Manager

Cluster 10: ~2020.05.04

For the first time, we observed hidden names being leveraged to hide the starting point of Excel 4.0 macro code in a large cluster. This is likely an attempt to thwart static parsing and analysis, both by analysts and automated solutions. Figure 34 shows how the Name Manager shows no names.

Figure 34: No names due to being hidden.

This is achieved through setting the fHidden bit in the Lbl record for the defined name at offset 0x0. At offset 0x15 is the Name field with constant 0x1 for Auto_Open.

This value of 0x21 has the fHidden and fBuiltin flags set (Figure 35).

Figure 35: Binary view of set bit flags.

After flipping the hidden bit, and adding a character (“g”) to the first byte of the Name at the end of this record, the name is visible (Figure 36).

Figure 36: Unhidden name is now displayed in Name Manager.

This cluster also leverages an interesting deobfuscation routine, which is a bit different from what we have seen in previous clusters. Although we have seen control-flow obfuscation through GOTO and RUN functions, this cluster adds a step through setting a value to a cell for later use in the routine (Figure 37): set cell value, jump to next code block, repeat.

Figure 37: Formulas transferring control flow and setting payload.

Cluster 11: ~2020.05.11

This cluster introduces a few interesting evasion techniques by detecting window activity. These macros attempt to identify if the Excel window is hidden or maximized, through three different usages of GET.WINDOW() (Figures 38 and 39). If the Excel window is not maximized, the malware will exit prior to exhibiting any interesting behavior. This appears to be an anti-analysis or sandbox evasion trick, as Excel windows may not be maximized in these environments.

Figure 38: Stepping through evasion routine – GET.WINDOW.

An additional evasion technique introduced in this malware is identifying if the malware is being debugged/analyzed. This is achieved through checking if the macro is being run in Single Step mode (Figure 40), which is a way to debug and step through Excel 4.0 macro code. If Single Step mode is detected, the malware will exit.

Figure 40: Macro detecting Single Step mode.

Cluster 12: ~2020.05.19

Almost all previously mentioned clusters leverage the CHAR function heavily during their deobfuscation routines to build the final macro payload character by character. This cluster changes the original technique by using the MID function instead of the CHAR function. The MID function is used to extract a substring (text) from a string by providing an index number for where to start, as well as the length of the text to extract. Figure 41 shows the MID function being used to extract a substring for each string in column B, and write it to column C.

Figure 41: Example of MID function.

Instances of this function are being concatenated and passed to the FORMULA function (Figure 42) to deobfuscate the final payload at runtime.

Figure 42: Return values of MID functions passed to FORMULA.

This cluster also leverages the FILES function (Figure 43). This function is used to obtain a list of files within a target directory. The malware author uses the FILES function along with ISERROR to check if the download succeeded (Figure 44).

Figure 43: FILES function to return either an error or list of files.
Figure 44: Checking if the download succeeded.

Before downloading the next stage, the malware first checks to see it can access the Internet (Figures 45 and 46), by connecting to microsoft.com, and then checking that the requested download succeeded.

Figure 45: Testing Internet connection – Cell CX29183 holds the URL
Figure 46: URL in Cell CX29183

If this download is successful, the macro will then repeat this process to obtain the next malware stage, and then invoke it.

Conclusion

Excel 4.0 macros continue to prove their value to attackers, providing a reliable method to get their code to run on a target. In many environments, Excel worksheets with macros are used too heavily for legitimate business purposes to disable or blacklist, thus analysts and security vendors will have to get used to consistently updating tooling and signatures as attacks continue to evolve. Excel 4.0 macros provide a near endless list of possibilities for malware authors and are evolving, becoming more sophisticated each day.

Appendix

References

[1] https://d13ot9o61jdzpp.cloudfront.net/files/Excel%204.0%20Macro%20Functions%20Reference.pdf

[2] https://www.loc.gov/preservation/digital/formats/digformatspecs/Excel97-2007BinaryFileFormat(xls)Specification.pdf

[3] https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/cd03cb5f-ca02-4934-a391-bb674cb8aa06

IOCs

Below a selection of samples for each of the aforementioned clusters.

Cluster 1

032b584b85e8335f83c751dd9c70121eb329bc08
30f908b7e97527e8a409ba93ec6258bdc7943b24
26b7691ba5bac30670790f2af8b67aec37609729
21deaf5c2adeb0ad3dae2048138c403691c3e71e
17d62231f60e97121f7258b0296feeadb7996d5f

Cluster 2

0b01f4a1ca08c0136d84e3ec91af856579784451
3db463c73a62b8bf0b6ab388884c6f3ad3a91c2f
2df8277a9ac5869fa7d5a5838fe6ab428207cf9c
Bb8d0af63f837e7f54d530d909feded623399a1b
C1217a7a59dc307089901556985239039cb89cbb

Cluster 3

0b80a32171c3f1b21015e880c7d3eb216eeff103
38d8c573b901bb0bf1cd1cc62945521923630e58
4a57bf593c6dc515a869b0facc4e1f2e3f8e85e1
2d8b6113d858f988977a82af82ca649963915ff7
1f66200bd820646e74b56ed8493794861eebc479

Cluster 4

15c3d132b4bc52f32541f6c7cb307467b7b85ee5
38d8c573b901bb0bf1cd1cc62945521923630e58
4a57bf593c6dc515a869b0facc4e1f2e3f8e85e1
2d8b6113d858f988977a82af82ca649963915ff7
1f66200bd820646e74b56ed8493794861eebc479

Cluster 5

13d35c6a71ee02362905982cfdd4f19ea7971fb0
52bc0b0ca4aa88000220627d150b4996e4a2bdc0
768d376f9ee85803940810a0d538c0ff4bd3ec4d
B547d9eb47e2977e4f85e92e62e9271875f3654b
D724559f4e1be192b8e60d1076ba6f00a76a7910

Cluster 6

Ced11986de75f108b631ffa8165108fb14b81be5
909196df72ba3f9e418379c184a435a7cad870e7
8ef757ad7ed52c7da069124bbbb0aba3f793a109
1e42150111e8347da68e7363c1545384609c62db
Eb71e2f83bfc3f30b7fbe132b6c8d07c363ffa23

Cluster 7

2bfc5c38fe25a095161716c17d4828348130ccc0
1fa19f7cf4f37966e131df657dc84b9372fa4e4e
1f9e7110a0b58072466c0d5bece253ea7e6f74e1
1c10fe44429a1cf0b875aef63f61469006f59682
0b55491e45c50bde473e84c329127945cb20133f

Cluster 8

Bd34811e40ac4ac351ace6c117161904b2e08a12
Bc152b1054983a5db6361ce34e246f2fd36e8f82
B528c29632b8e517fbfcf317711636b1b9b874a8
B6a2634e82a206c8759616d5be4c6913400bf9ae
B4b9a6efe0277c7bf7599291912b9ab2c3e171a5

Cluster 9

1953ac3dabc8affd223277564b4c0d9ac1332a9b

Cluster 10

F377028e7946a812948a5c7ba44e954c8fbdfe4a
F1459d32a711cd5e3a33b31334be28ed5ef9ed3d
De942054e5a5b9b011acf04e3de492d951705b28
D53200f073da1091fb7263644d8eec9c4d063b96

Cluster 11

68bcdbfd523c1288487178f0154a272eb316b0fb
59c4702ac6d89d244dc40d56ec6c6491507da8c8
51d1d14772d3087b426e74d91c977f33325c20e0
11f71766d280e26763742573fae133a51dbbca8b
8d15ea7926f2fcfeca8c2df4fc720d9fb428ca9e

Cluster 12

0339b41fe3dc92a1e95779cc5e3ac2b0fff0f48e
198d3a4bc0b8af702a59cfab02f86476a5fcd5b0
91e0eb52ff166781b795dda55ff76363d4146856
89ae64f1a7db8dd4bd24daf587206c50c942d095
87a87eb02940daa8a09e821d439a60e1ea01c039

James Haughom

James Haughom

James Haughom is a Malware Reverse Engineer at Lastline.Prior to Lastline, James supported Incident Response and Intel teams in the federal space as a contractor.This support included Malware Analysis/Reverse Engineering, DFIR, and Tool Development.
James Haughom

Latest posts by James Haughom (see all)

Stefano Ortolani

Stefano Ortolani

Stefano Ortolani is Director of Threat Intelligence at Lastline. Prior to that he was part of the research team in Kaspersky Lab in charge of fostering operations with CERTs, governments, universities, and law enforcement agencies. Before that he earned his Ph.D. in Computer Science from the VU University Amsterdam.
Stefano Ortolani